-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Migrating to Google Auth Library #180
Conversation
@shcheklein Not sure if you've seen this |
@junpeng-jp hi, yes! I've seen ... I'll definitely come and review this! Thanks for making this PR 🙏 |
I wasn't very thorough with adding new test cases for some of the new errors in google-auth library. Let me know if I've missed something in the code |
@junpeng-jp no worries, we can add tests later if needed. First things I would be looking for are:
Again, I'll take a look as soon as I can also. |
For multi-threading, I've updated the library to follow the recommendation here by the authors of google-api-python-client. Change all references to |
does it mean it creates and authorizes a new object per every request? It might mean that we won't have the connections pool, and in general it's more expensive. One more question - is refresh token logic now handled in the Authorize call? does it save back the new token into the file? |
I'm kind of new to multiprocessing in general. I've largely referenced this old article & the previous linked documentation by google on how to share credentials across threads
There's no need to manually refresh credentials anymore if i'm not wrong. The in-memory Credential objects would have the latest token & shared across all threads and their AuthorizedHttp objects. Refresh requests are made automatically by the AuthorizedHttp / AuthorizedSession / any other transport related objects in google-auth library whenever they're making a request to the API (usually in the If backend is specified, credentials are saved only once after the user provides consent & exchanges the access token + refresh token. A new Credential object can be created from the file if the in memory Credentials were deleted. This new credential is treated as an expired credential & will automatically refresh by the Http / Session objects using the saved refresh token |
Yep. But ideally it's cached per thread (it's shared across multiple requests within the same thread). There was logic with thread local: https://github.com/iterative/PyDrive2/blob/master/pydrive2/auth.py#L78-L80
Got it. Need to check since previously it was saving it back on each refresh, I wonder what is the difference here (clearly it means that it will be running refresh every time you restart the script, but are there any other consequences?) |
self.auth.Authorize() | ||
|
||
# Ensure that a thread-safe HTTP object is provided. | ||
if ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like providing a custom http
into a request won't work anymore. It might be needed to scenarios like proxy servers, etc. When users take responsibility in providing and managing HTTP clients.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you share some examples of how this decorator can be used to provide a custom http
into a request? I'm not really familiar with the above scenarios
Regarding the previous discussion on caching of Http object per thread, I will add thread_local back to the auth object now that I understand its purpose
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@junpeng-jp any progress on that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added this as part of a new lazily-loaded property called authorized_http
. It would create a AuthorizedHttp
object and add it to self.thread_local
if it doesn't exist. Else, just retrieve it from self.thread_local
. Tagged you below
pydrive2/auth.py
Outdated
@@ -147,20 +65,6 @@ class GoogleAuth(ApiAttributeMixin): | |||
and authorization. | |||
""" | |||
|
|||
DEFAULT_SETTINGS = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Q: why do we move the outside the class? (it will break some project that rely on this to be inside)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't really expect anybody to use this outside of the GoogleAuth constructor. I can definitely add it back as a class static variable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't use it anymore in DVC, still let's not move things in this PR, let's focus on the migration first, please
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've reverted the DEFAULT_SETTINGS to a class static variable
@CheckAuth | ||
if self.storage: | ||
self.SaveCredentials() | ||
|
||
def CommandLineAuth(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we decide to replace it here with LocalWebServer (we are changing it's behavior) - it's better to get rid of it completely (and we will bump the major version)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to understand what are the use-cases of loading credentials from a backend. For now, I imagine only the following use-case:
App wants to cache the user's token and refresh token so that the next time he returns, the App can simply identify the user, retrieve the authorized user credentials from backend & silently refresh the token without a need for the User to re-auth & exchange a new token
In this case, you would:
- Save the authorized user credentials after the first authentication store the user's identity & refresh token for silent re-auth
- Whenever a new session starts, identify the user, retrieve authorized credentials from backend, and create a new GoogleAuth object
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For service auth use-cases, I can't think of a reason to duplicate your service account private key elsewhere besides the first known location of this file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to understand what are the use-cases of loading credentials from a backend.
hmm, I'm not sure I understand how this question is relevant to the comment I made, could you clarify please - My point is that it's a good opportunity to completely get rid of the CommandLineAuth, wdyt?
Could you clarify your comments please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think i might have misread this somehow. Sure, I think we can get rid of CommandLineAuth. Should I raise a deprecation Warning or just delete it outright?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can raise "NotImplemented" or something with an explanation, sounds good to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've raised a DeprecationWarning Error which tells the users to use the local webserver auth with a loopback address
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still, let's not fall back to the serve implementation. It's better to raise an exception in this case. Changing behavior this way can be unexpected / lead to some unintended consequences.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, it will be raising an exception here
@junpeng-jp sorry for the delay, I've been busy, but this PR is important and I'll get to it soon. |
@shcheklein @junpeng-jp i need this PR, any idea when will it be merged? |
@lappazos sorry, it's on me. I need to come back and do the next iteration of reviewing it. |
and "http" in kwargs["param"] | ||
and kwargs["param"]["http"] is not None | ||
): | ||
self.http = kwargs["param"]["http"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we are still this logic, I think the way it worked was something like file.Upload(param={http=...})
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will add back this functionality
pydrive2/auth.py
Outdated
@property | ||
def authorized_http(self): | ||
# Ensure that a thread-safe, Authorized HTTP object is provided | ||
# If HTTP object not specified, create or resuse an HTTP |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this comment is outdated / copy-pasted ... HTTP object could not be provided here
pydrive2/auth.py
Outdated
# The Installed App Flow raises OSError when binding to a used port | ||
|
||
# More on TCP state changes: https://users.cs.northwestern.edu/~agupta/cs340/project2/TCPIP_State_Transition_Diagram.pdf | ||
if e.errno not in (10048, 48): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I still don't understand this logic - my understanding is that we can just try another port in case of an error (any OSError).
except OSError:
pass
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ohhhh yeahh. There's no need for me to specifically catch these 2 error codes. Will update this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
|
||
# in-case reauth / consent required | ||
# create a flow object so that reauth flow can be triggered | ||
additional_config = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, don't quite understand this. How was this handled before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean the whole create a flow object so that reauth flow can be triggered
part
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the past, self.flow
was populated using self.GetFlow()
as part of the CheckAuth
decorator, which handles the whole refresh process. Now that refresh is handled by the google_auth
lib, I've opted to create the Flow object within the self.LoadCredentialsFile()
using the recently loaded credentials
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but you also have that lazy loading property flow
that is being used internally, or do you mean that we have to create it again in certain cases?
also, btw, I think flow()
now duplicates GetFlow
which is not nice.
@@ -75,6 +75,7 @@ | |||
"client_service_email": {"type": str, "required": False}, | |||
"client_pkcs12_file_path": {"type": str, "required": False}, | |||
"client_json_file_path": {"type": str, "required": False}, | |||
"use_default": {"type": bool, "required": False}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we skip for the sake of this PR the default credentials auth? I think it should be done outside of the ServiceAuth + probably we should be using a different setting for this (don't know which one yet)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll shift this into a subsequent PR once we've migrated to new google auth library
@shcheklein I realised that there's quite still a lot of issues with this PR that needs fixing but I was wondering if it's okay for me to close this PR & raise a new one? There's been a large number of new commits over the 1month gap that I'm having trouble resolving merge conflicts I'll start threads on the unresolved comments above once i've raised a new PR |
Yes, it'a absolutely okay, nw. Thanks for doing this! |
Closing this PR in favor of a new one @ #221. Will be raising the pull request soon |
Summary of changes
Potentially solves the following issue as well:
google.auth.default
#179)google.auth.default
is added to ServiceAuth w/ a newuse_default
config that controls this behaviour